Goto

Collaborating Authors

 object landmark


Unsupervised Learning of Object Landmarks via Self-Training Correspondence

Neural Information Processing Systems

This paper addresses the problem of unsupervised discovery of object landmarks. We take a different path compared to that of existing works, based on 2 novel perspectives: (1) Self-training: starting from generic keypoints, we propose a self-training approach where the goal is to learn a detector that improves itself becoming more and more tuned to object landmarks.


Unsupervised Learning of Object Landmarks through Conditional Image Generation

Neural Information Processing Systems

We propose a method for learning landmark detectors for visual objects (such as the eyes and the nose in a face) without any manual supervision. We cast this as the problem of generating images that combine the appearance of the object as seen in a first example image with the geometry of the object as seen in a second example image, where the two examples differ by a viewpoint change and/or an object deformation. In order to factorize appearance and geometry, we introduce a tight bottleneck in the geometry-extraction process that selects and distils geometry-related features. Compared to standard image generation problems, which often use generative adversarial networks, our generation task is conditioned on both appearance and geometry and thus is significantly less ambiguous, to the point that adopting a simple perceptual loss formulation is sufficient. We demonstrate that our approach can learn object landmarks from synthetic image deformations or videos, all without manual supervision, while outperforming state-of-the-art unsupervised landmark detectors. We further show that our method is applicable to a large variety of datasets - faces, people, 3D objects, and digits - without any modifications.



Review for NeurIPS paper: Unsupervised Learning of Object Landmarks via Self-Training Correspondence

Neural Information Processing Systems

Additional Feedback: Detailed feedback: 1. Authors state that "[other] methods, despite presenting consistent results for various object categories, have also their own limitations such as discovering landmarks with no clear semantic meaning." This claim is rather strong since the proposed method also does not guarantee any clear semantic meaning for object landmarks discovered by their method. That work actually reports better accuracy on BBCPose than the proposed method and hence should be also included. Since the paper is trying to distinguish between "keypoints/landmarks" and "object landmarks" it would be helpful to have a clear definition and use them consistently. For example, in the introduction, the three words are used interchangeably but then in the section 3 "keypoints and landmarks" refer to very different entities than "object landmarks".


Review for NeurIPS paper: Unsupervised Learning of Object Landmarks via Self-Training Correspondence

Neural Information Processing Systems

This submission proposes an approach to unsupervised object landmark discovery. It initially received four reviews with mixed positive and negative scores (6,7,5,5). The rebuttal addressed some of the remaining concerns, which resulted in an increase in scores to (7,7,6,6). For these reasons, the AC's recommendation is to accept this submission for presentation as a poster, with a request for the authors to carefully revise the manuscript for the camera ready version to address the remaining concerns of the reviewers and improve the presentation clarity.


Unsupervised Learning of Object Landmarks via Self-Training Correspondence

Neural Information Processing Systems

This paper addresses the problem of unsupervised discovery of object landmarks. We take a different path compared to that of existing works, based on 2 novel perspectives: (1) Self-training: starting from generic keypoints, we propose a self-training approach where the goal is to learn a detector that improves itself becoming more and more tuned to object landmarks. Compared to previous works, our approach can learn landmarks that are more flexible in terms of capturing large changes in viewpoint. We show the favourable properties of our method on a variety of difficult datasets including LS3D, BBCPose and Human3.6M.


Reviews: Unsupervised Learning of Object Landmarks through Conditional Image Generation

Neural Information Processing Systems

Summary: This paper proposes a method for conditional image generation by jointly learning "structure" points such as face and body landmarks. The authors propose to use a convolutional neural network with a modified loss to capture the image transformation and landmarks. They evaluate their approach on a set of datasets including CelebA, VoxCeleb, and Human 3.6M. Positive: -The problem addressed is an important problem and the authors attempt to solve it using a well engineered approach. Negatives: -The pre-processing using heat maps, normalizing them into probabilities, then using a gaussian kernel to produce the features is a bit heuristic.


Unsupervised Learning of Object Landmarks through Conditional Image Generation

Jakab, Tomas, Gupta, Ankush, Bilen, Hakan, Vedaldi, Andrea

Neural Information Processing Systems

We propose a method for learning landmark detectors for visual objects (such as the eyes and the nose in a face) without any manual supervision. We cast this as the problem of generating images that combine the appearance of the object as seen in a first example image with the geometry of the object as seen in a second example image, where the two examples differ by a viewpoint change and/or an object deformation. In order to factorize appearance and geometry, we introduce a tight bottleneck in the geometry-extraction process that selects and distils geometry-related features. Compared to standard image generation problems, which often use generative adversarial networks, our generation task is conditioned on both appearance and geometry and thus is significantly less ambiguous, to the point that adopting a simple perceptual loss formulation is sufficient. We demonstrate that our approach can learn object landmarks from synthetic image deformations or videos, all without manual supervision, while outperforming state-of-the-art unsupervised landmark detectors.